224 research outputs found

    Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers

    Full text link
    Adversarial patch attack aims to fool a machine learning model by arbitrarily modifying pixels within a restricted region of an input image. Such attacks are a major threat to models deployed in the physical world, as they can be easily realized by presenting a customized object in the camera view. Defending against such attacks is challenging due to the arbitrariness of patches, and existing provable defenses suffer from poor certified accuracy. In this paper, we propose PatchVeto, a zero-shot certified defense against adversarial patches based on Vision Transformer (ViT) models. Rather than training a robust model to resist adversarial patches which may inevitably sacrifice accuracy, PatchVeto reuses a pretrained ViT model without any additional training, which can achieve high accuracy on clean inputs while detecting adversarial patched inputs by simply manipulating the attention map of ViT. Specifically, each input is tested by voting over multiple inferences with different attention masks, where at least one inference is guaranteed to exclude the adversarial patch. The prediction is certifiably robust if all masked inferences reach consensus, which ensures that any adversarial patch would be detected with no false negative. Extensive experiments have shown that PatchVeto is able to achieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixel adversarial patches), significantly outperforming state-of-the-art methods. The clean accuracy is the same as vanilla ViT models (81.8% on ImageNet) since the model parameters are directly reused. Meanwhile, our method can flexibly handle different adversarial patch sizes by simply changing the masking strategy.Comment: 12 pages, 5 figure

    Microbial Communities in Water during Red Tides along the Coast of China-A Case Study of Prorocentrum Donghaiense Red Tide in the East China Sea

    Get PDF
    Red tides are a major public hazard in the global oceans. The coast of the East China Sea is the sea area where red tide disasters are the most frequent and serious in China. In order to accurately grasp the occurrence of red tides in the coastal waters of the East China Sea, and to understand the microbial communities in the waters during the occurrence of red tides in the East China Sea, a special survey of red tides in the coastal waters of Zhejiang, China was carried out in June 2018. The results showed that nutrient concentrations of N and P were generally high in this area, DIN concentrations in most areas exceeded the permitted limit of Chinese seawater quality grade I. There were significant differences in dissolved oxygen, pH, COD, chlorophyll and phytoplankton abundance of red tides. During the investigation, red tides were found in the waters near the Yushan Islands. The content of chlorophyll a was 42.12mg/m3, the cell abundance of phytoplankton was 8.16×108/L, and the abundance of Prorocentrum edulis accounted for 98.5%. The Illumina MiSeq sequencing platform was used for 16s high-throughput sequencing of water microorganisms, and a total of 16 bacteria were identified. Proteobacteria is the first dominant phylum, followed by Cyanobacteria and Bacteroides. Some differences in bacterial community compositions between HAB and the nearby seawater were observed. The predominant bacteria in the red tide occurrence area were Proteobacteria, comprising 46.1% of the relative abundance; while the predominant bacteria in the nearby sea area, comprising 42.0% of the relative abundance

    TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios

    Full text link
    Time-to-Contact (TTC) estimation is a critical task for assessing collision risk and is widely used in various driver assistance and autonomous driving systems. The past few decades have witnessed development of related theories and algorithms. The prevalent learning-based methods call for a large-scale TTC dataset in real-world scenarios. In this work, we present a large-scale object oriented TTC dataset in the driving scene for promoting the TTC estimation by a monocular camera. To collect valuable samples and make data with different TTC values relatively balanced, we go through thousands of hours of driving data and select over 200K sequences with a preset data distribution. To augment the quantity of small TTC cases, we also generate clips using the latest Neural rendering methods. Additionally, we provide several simple yet effective TTC estimation baselines and evaluate them extensively on the proposed dataset to demonstrate their effectiveness. The proposed dataset is publicly available at https://open-dataset.tusen.ai/TSTTC.Comment: 19 pages, 9 figure

    Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models

    Full text link
    The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, erroneous generations, such as false predictions, misinformation, and hallucination made by LLMs, have also raised severe concerns for the trustworthiness of LLMs', especially in safety-, security- and reliability-sensitive scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by general machine learning (ML) models, little is known about whether and to what extent it can help explore an LLM's capabilities and counteract its undesired behavior. To bridge the gap, in this paper, we initiate an exploratory study on the risk assessment of LLMs from the lens of uncertainty. In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions. In addition to general NLP tasks, we extensively conduct experiments with four LLMs for code generation on two datasets. We find that uncertainty estimation can potentially uncover buggy programs generated by LLMs. Insights from our study shed light on future design and development for reliable LLMs, facilitating further research toward enhancing the trustworthiness of LLMs.Comment: 20 pages, 4 figure

    Question Decomposition Tree for Answering Complex Questions over Knowledge Bases

    Full text link
    Knowledge base question answering (KBQA) has attracted a lot of interest in recent years, especially for complex questions which require multiple facts to answer. Question decomposition is a promising way to answer complex questions. Existing decomposition methods split the question into sub-questions according to a single compositionality type, which is not sufficient for questions involving multiple compositionality types. In this paper, we propose Question Decomposition Tree (QDT) to represent the structure of complex questions. Inspired by recent advances in natural language generation (NLG), we present a two-staged method called Clue-Decipher to generate QDT. It can leverage the strong ability of NLG model and simultaneously preserve the original questions. To verify that QDT can enhance KBQA task, we design a decomposition-based KBQA system called QDTQA. Extensive experiments show that QDTQA outperforms previous state-of-the-art methods on ComplexWebQuestions dataset. Besides, our decomposition method improves an existing KBQA system by 12% and sets a new state-of-the-art on LC-QuAD 1.0.Comment: Accepted by AAAI202

    Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

    Full text link
    Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm

    Molecular state interpretation of charmed baryons in the quark model

    Full text link
    Stimulated by the observation of Λc(2910)+\Lambda_c(2910)^+ by the Belle Collaboration, the SS-wave qqqqˉc (q=u or d)qqq\bar{q}c~(q=u~\text{or}~d) pentaquark systems with II = 0, JPJ^P = 12, 32and 52\frac{1}{2}^-,~\frac{3}{2}^- and~\frac{5}{2}^- are investigated in the framework of quark delocalization color screening model(QDCSM). The real-scaling method is utilized to check the bound states and the genuine resonance states. The root mean square of cluster spacing is also calculated to study the structure of the states and estimate if the state is resonance state or not. The numerical results show that Λc(2910)\Lambda_{c}(2910) cannot be interpreted as a molecular state, and Σc(2800)\Sigma_{c}(2800) cannot be explained as the NDND molecular state with JP=1/2J^P=1/2^-. Λc(2595)\Lambda_{c}(2595) can be interpreted as the molecular state with JP=12J^P=\frac{1}{2}^- and the main component is Σcπ\Sigma_{c}\pi. Λc(2625)\Lambda_{c}(2625) can be interpreted as the molecular state with JP=32J^P=\frac{3}{2}^- and the main component is Σcπ\Sigma_{c}^{*}\pi. Λc(2940)\Lambda_{c}(2940) is likely to be interpreted as a molecular state with JP=3/2J^P=3/2^-, and the main component is NDND^{*}. Besides, two new molecular states are predicted, one is the JP=3/2J^P=3/2^- Σcρ\Sigma_{c}\rho resonance state with the mass around 3140 MeV, another one is the JP=52J^P=\frac{5}{2}^- Σcρ\Sigma_{c}^*\rho with the mass of 3188.3 MeV.Comment: 12 pages, 3 figure

    LUNA: A Model-Based Universal Analysis Framework for Large Language Models

    Full text link
    Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, LLMs have made rapid advancements that have propelled AI to a new level, enabling even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large model scale, and autoregressive generation schema, differ from classic AI software based on CNNs and RNNs and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand. Towards bridging this gap, we initiate an early exploratory study and propose a universal analysis framework for LLMs, LUNA, designed to be general and extensible, to enable versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset, which is empowered by various abstract model construction methods. To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes.Comment: 44 pages, 9 figure

    Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model

    Full text link
    Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, our generative framework overcome the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications
    corecore